Ad Hoc Data and the Token Ambiguity Problem

نویسندگان

  • Qian Xi
  • Kathleen Fisher
  • David Walker
  • Kenny Q. Zhu
چکیده

PADS is a declarative language used to describe the syntax and semantic properties of ad hoc data sources such as financial transactions, server logs and scientific data sets. The PADS compiler reads these descriptions and generates a suite of useful data processing tools such as format translators, parsers, printers and even a query engine, all customized to the ad hoc data format in question. Recently, however, to further improve the productivity of programmers that manage ad hoc data sources, we have turned to using PADS as an intermediate language in a system that first infers a PADS description directly from example data and then passes that description to the original compiler for tool generation. A key subproblem in the inference engine is the token ambiguity problem — the problem of determining which substrings in the example data correspond to complex tokens such as dates, URLs, or comments. In order to solve the token ambiguity problem, the paper studies the relative effectiveness of three different statistical models for tokenizing ad hoc data. It also shows how to incorporate these models into a general and effective format inference algorithm. In addition to using a declarative language (PADS) as a key intermediate form, we have implemented the system as a whole in ML.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

T-MAH: A Token Passing MAC protocol for Ad Hoc Networks

The Token Passing MAC protocol for Ad Hoc networks (T-MAH), discussed in this paper, is a distributed medium access protocol designed for wireless multi-hop networks. With T-MAH access scheme the network is organized in clusters (called Token Groups) with a Token Group Head as the leader of the group. In each single cluster is used a token based technique, i.e. each node in the cluster is allow...

متن کامل

A new solution to the h-out of-k problem in mobile ad hoc networks

In this paper, we describe a new token based h-out of-k mutual exclusion solution for mobile ad hoc networks. This protocol does neither use the routing layer nor a logical structure and agrees requests based on their distances away to the token, their olds, and there resources number. A request is sent on the routes of the nodes for which a request is present in the local queue, with a dynamic...

متن کامل

Broadcast Routing in Wireless Ad-Hoc Networks: A Particle Swarm optimization Approach

While routing in multi-hop packet radio networks (static Ad-hoc wireless networks), it is crucial to minimize power consumption since nodes are powered by batteries of limited capacity and it is expensive to recharge the device. This paper studies the problem of broadcast routing in radio networks. Given a network with an identified source node, any broadcast routing is considered as a directed...

متن کامل

A Group Mutual Exclusion Algorithm for Ad Hoc Mobile Networks

In this paper, we propose a token based algorithm to solve the group mutual exclusion (GME) problem for ad hoc mobile networks. The proposed algorithm is adapted from the RL algorithm in [WWV98] and utilizes the concept of weight throwing in [Tse95]. We prove that the proposed algorithm satisfies the mutual exclusion, the bounded delay, and the concurrent entering properties. The proposed algor...

متن کامل

Energy Efficient Routing in Mobile Ad Hoc Networks by Using Honey Bee Mating Optimization

Mobile Ad hoc networks (MANETs) are composed of mobile stations communicating through wireless links, without any fixed backbone support. In these networks, limited power energy supply, and frequent topology changes caused by node mobility, makes their routing a challenging problem. TORA is one of the routing protocols that successfully copes with the nodes’ mobility side effects, but it do...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009